AITopics | Mato Grosso do Sul

Collaborating Authors

Mato Grosso do Sul

A Comparison of Conversational Models and Humans in Answering Technical Questions: the Firefox Case

Correia, Joao, Coutinho, Daniel, Castelluccio, Marco, Barbosa, Caio, de Mello, Rafael, Sarma, Anita, Garcia, Alessandro, Gerosa, Marco, Steinmacher, Igor

arXiv.org Artificial IntelligenceOct-28-2025

The use of Large Language Models (LLMs) to support tasks in software development has steadily increased over recent years. From assisting developers in coding activities to providing conversational agents that answer newcomers' questions. In collaboration with the Mozilla Foundation, this study evaluates the effectiveness of Retrieval-Augmented Generation (RAG) in assisting developers within the Mozilla Firefox project. We conducted an empirical analysis comparing responses from human developers, a standard GPT model, and a GPT model enhanced with RAG, using real queries from Mozilla's developer chat rooms. To ensure a rigorous evaluation, Mozilla experts assessed the responses based on helpfulness, comprehensiveness, and conciseness. The results show that RAG-assisted responses were more comprehensive than human developers (62.50% to 54.17%) and almost as helpful (75.00% to 79.17%), suggesting RAG's potential to enhance developer assistance. However, the RAG responses were not as concise and often verbose. The results show the potential to apply RAG-based tools to Open Source Software (OSS) to minimize the load to core maintainers without losing answer quality. Toning down retrieval mechanisms and making responses even shorter in the future would enhance developer assistance in massive projects like Mozilla Firefox.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.21933

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.40)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.05)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.68)

Industry:

Education (0.92)
Information Technology > Software (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models

de Oliveira, Matheus Vinicius da Silva, Silva, Jonathan de Andrade, Fontao, Awdren de Lima

arXiv.org Artificial IntelligenceOct-1-2025

Large Language Models (LLMs) are widely used across multiple domains but continue to raise concerns regarding security and fairness. Beyond known attack vectors such as data poisoning and prompt injection, LLMs are also vulnerable to fairness bugs. These refer to unintended behaviors influenced by sensitive demographic cues (e.g., race or sexual orientation) that should not affect outcomes. Another key issue is hallucination, where models generate plausible yet false information. Retrieval-Augmented Generation (RAG) has emerged as a strategy to mitigate hallucinations by combining external retrieval with text generation. However, its adoption raises new fairness concerns, as the retrieved content itself may surface or amplify bias. This study conducts fairness testing through metamorphic testing (MT), introducing controlled demographic perturbations in prompts to assess fairness in sentiment analysis performed by three Small Language Models (SLMs) hosted on HuggingFace (Llama-3.2-3B-Instruct, Mistral-7B-Instruct-v0.3, and Llama-3.1-Nemotron-8B), each integrated into a RAG pipeline. Results show that minor demographic variations can break up to one third of metamorphic relations (MRs). A detailed analysis of these failures reveals a consistent bias hierarchy, with perturbations involving racial cues being the predominant cause of the violations. In addition to offering a comparative evaluation, this work reinforces that the retrieval component in RAG must be carefully curated to prevent bias amplification. The findings serve as a practical alert for developers, testers and small organizations aiming to adopt accessible SLMs without compromising fairness or reliability.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.26584

Country: South America > Brazil > Mato Grosso do Sul > Campo Grande (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generating Proto-Personas through Prompt Engineering: A Case Study on Efficiency, Effectiveness and Empathy

Ayach, Fernando, Lameirão, Vitor, Leão, Raul, Felizardo, Jerfferson, Sobrinho, Rafael, Borges, Vanessa, Matsubara, Patrícia, Fontão, Awdren

arXiv.org Artificial IntelligenceJul-14-2025

Proto-personas are commonly used during early-stage Product Discovery, such as Lean Inception, to guide product definition and stakeholder alignment. However, the manual creation of proto-personas is often time-consuming, cognitively demanding, and prone to bias. In this paper, we propose and empirically investigate a prompt engineering-based approach to generate proto-personas with the support of Generative AI (GenAI). Our goal is to evaluate the approach in terms of efficiency, effectiveness, user acceptance, and the empathy elicited by the generated personas. We conducted a case study with 19 participants embedded in a real Lean Inception, employing a qualitative and quantitative methods design. The results reveal the approach's efficiency by reducing time and effort and improving the quality and reusability of personas in later discovery phases, such as Minimum Viable Product (MVP) scoping and feature refinement. While acceptance was generally high, especially regarding perceived usefulness and ease of use, participants noted limitations related to generalization and domain specificity. Furthermore, although cognitive empathy was strongly supported, affective and behavioral empathy varied significantly across participants. These results contribute novel empirical evidence on how GenAI can be effectively integrated into software Product Discovery practices, while also identifying key challenges to be addressed in future iterations of such hybrid design processes.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.08594

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
South America > Brazil > São Paulo (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.68)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Add feedback

Leveraging GPT-4 for Vulnerability-Witnessing Unit Test Generation

Antal, Gábor, Bán, Dénes, Isztin, Martin, Ferenc, Rudolf, Hegedűs, Péter

arXiv.org Artificial IntelligenceJun-16-2025

In the life-cycle of software development, testing plays a crucial role in quality assurance. Proper testing not only increases code coverage and prevents regressions but it can also ensure that any potential vulnerabilities in the software are identified and effectively fixed. However, creating such tests is a complex, resource-consuming manual process. To help developers and security experts, this paper explores the automatic unit test generation capability of one of the most widely used large language models, GPT-4, from the perspective of vulnerabilities. We examine a subset of the VUL4J dataset containing real vulnerabilities and their corresponding fixes to determine whether GPT-4 can generate syntactically and/or semantically correct unit tests based on the code before and after the fixes as evidence of vulnerability mitigation. We focus on the impact of code contexts, the effectiveness of GPT-4's self-correction ability, and the subjective usability of the generated test cases. Our results indicate that GPT-4 can generate syntactically correct test cases 66.5\% of the time without domain-specific pre-training. Although the semantic correctness of the fixes could be automatically validated in only 7. 5\% of the cases, our subjective evaluation shows that GPT-4 generally produces test templates that can be further developed into fully functional vulnerability-witnessing tests with relatively minimal manual effort. Therefore, despite the limited data, our initial findings suggest that GPT-4 can be effectively used in the generation of vulnerability-witnessing tests. It may not operate entirely autonomously, but it certainly plays a significant role in a partially automated process.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.11559

Country:

Europe > Hungary > Csongrád-Csanád County > Szeged (0.06)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Large Language Models for Command Injection Vulnerability Analysis in Python: An Empirical Study on Popular Open-Source Projects

Wang, Yuxuan, Chen, Jingshu, Wang, Qingyang

arXiv.org Artificial IntelligenceMay-22-2025

Command injection vulnerabilities are a significant security threat in dynamic languages like Python, particularly in widely used open-source projects where security issues can have extensive impact. With the proven effectiveness of Large Language Models(LLMs) in code-related tasks, such as testing, researchers have explored their potential for vulnerabilities analysis. This study evaluates the potential of large language models (LLMs), such as GPT-4, as an alternative approach for automated testing for vulnerability detection. In particular, LLMs have demonstrated advanced contextual understanding and adaptability, making them promising candidates for identifying nuanced security vulnerabilities within code. To evaluate this potential, we applied LLM-based analysis to six high-profile GitHub projects-Django, Flask, TensorFlow, Scikit-learn, PyTorch, and Langchain-each with over 50,000 stars and extensive adoption across software development and academic research. Our analysis assesses both the strengths and limitations of LLMs in detecting command injection vulnerabilities, evaluating factors such as detection accuracy, efficiency, and practical integration into development workflows. In addition, we provide a comparative analysis of different LLM tools to identify those most suitable for security applications. Our findings offer guidance for developers and security researchers on leveraging LLMs as innovative and automated approaches to enhance software security.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.15088

Country:

North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.14)
South America > Brazil > Mato Grosso do Sul > Campo Grande (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation

Ferreira, Alessandro dos Santos, Ramos, Ana Paula Marques, Junior, José Marcato, Gonçalves, Wesley Nunes

arXiv.org Artificial IntelligenceMay-22-2025

Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images. Our proposed pipeline enhances low-resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real-ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low-resolution images, highlighting the effectiveness of our method compared to traditional pipelines.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.15077

Country:

South America > Brazil > Mato Grosso do Sul > Campo Grande (0.04)
South America > Brazil > São Paulo (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)
Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values

Rodríguez, Martín, Rossi, Gustavo, Fernandez, Alejandro

arXiv.org Artificial IntelligenceMay-16-2025

The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.

boundary value, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.0983

Country:

North America > United States > New York > New York County > New York City (0.05)
South America > Brazil > Mato Grosso do Sul > Campo Grande (0.04)
South America > Argentina > Pampas > Buenos Aires Province > La Plata (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping

Schmidinger, J., Vogel, S., Barkov, V., Pham, A. -D., Gebbers, R., Tavakoli, H., Correa, J., Tavares, T. R., Filippi, P., Jones, E. J., Lukas, V., Boenecke, E., Ruehlmann, J., Schroeter, I., Kramer, E., Paetzold, S., Kodaira, M., Wadoux, A. M. J. -C., Bragazza, L., Metzger, K., Huang, J., Valente, D. S. M., Safanelli, J. L., Bottega, E. L., Dalmolin, R. S. D., Farkas, C., Steiger, A., Horst, T. Z., Ramirez-Lopez, L., Scholten, T., Stumpf, F., Rosso, P., Costa, M. M., Zandonadi, R. S., Wetterlind, J., Atzmueller, M.

arXiv.org Artificial IntelligenceFeb-27-2025

Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging and contentious. Benchmarking studies on multiple datasets are needed to reveal strengths and limitations of commonly used methods. Existing DSM studies usually rely on a single dataset with restricted access, leading to incomplete and potentially misleading conclusions. To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa). LimeSoDa consists of 31 field- and farm-scale datasets from various countries. Each dataset has three target soil properties: (1) soil organic matter or soil organic carbon, (2) clay content and (3) pH, alongside a set of features. Features are dataset-specific and were obtained by optical spectroscopy, proximal- and remote soil sensing. All datasets were aligned to a tabular format and are ready-to-use for modeling. We demonstrated the use of LimeSoDa for benchmarking by comparing the predictive performance of four learning algorithms across all datasets. This comparison included multiple linear regression (MLR), support vector regression (SVR), categorical boosting (CatBoost) and random forest (RF). The results showed that although no single algorithm was universally superior, certain algorithms performed better in specific contexts. MLR and SVR performed better on high-dimensional spectral datasets, likely due to better compatibility with principal components. In contrast, CatBoost and RF exhibited considerably better performances when applied to datasets with a moderate number (< 20) of features. These benchmarking results illustrate that the performance of a method is highly context-dependent. LimeSoDa therefore provides an important resource for improving the development and evaluation of statistical methods in DSM.

algorithm, dataset, limesoda, (15 more...)

arXiv.org Artificial Intelligence

2502.20139

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany > Lower Saxony (0.14)
South America > Brazil > São Paulo (0.04)
(26 more...)

Genre:

Research Report > New Finding (0.86)
Research Report > Experimental Study (0.68)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)

Add feedback

Improving Sickle Cell Disease Classification: A Fusion of Conventional Classifiers, Segmented Images, and Convolutional Neural Networks

Cardoso, Victor Júnio Alcântara, Moreira, Rodrigo, Mari, João Fernando, Moreira, Larissa Ferreira Rodrigues

arXiv.org Artificial IntelligenceDec-23-2024

Sickle cell anemia, which is characterized by abnormal erythrocyte morphology, can be detected using microscopic images. Computational techniques in medicine enhance the diagnosis and treatment efficiency. However, many computational techniques, particularly those based on Convolutional Neural Networks (CNNs), require high resources and time for training, highlighting the research opportunities in methods with low computational overhead. In this paper, we propose a novel approach combining conventional classifiers, segmented images, and CNNs for the automated classification of sickle cell disease. We evaluated the impact of segmented images on classification, providing insight into deep learning integration. Our results demonstrate that using segmented images and CNN features with an SVM achieves an accuracy of 96.80%. This finding is relevant for computationally efficient scenarios, paving the way for future research and advancements in medical-image analysis.

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.5753/eniac.2023.234076

2412.17975

Country:

South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
South America > Brazil > Rio de Janeiro > Niterói (0.04)
South America > Brazil > Minas Gerais (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Hematology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Add feedback

System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

Bhatia, Shreya, Gandhi, Tarushi, Kumar, Dhruv, Jalote, Pankaj

arXiv.org Artificial IntelligenceDec-4-2024

System testing is essential in any software development project to ensure that the final products meet the requirements. Creating comprehensive test cases for system testing from requirements is often challenging and time-consuming. This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents. In this study, we collected the SRS documents of five software engineering projects containing functional and non-functional requirements, which were implemented, tested, and delivered by respective developer teams. For generating test case designs, we used ChatGPT-4o Turbo model. We employed prompt-chaining, starting with an initial context-setting prompt, followed by prompts to generate test cases for each use case. We assessed the quality of the generated test case designs through feedback from the same developer teams as mentioned above. Our experiments show that about 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant. Notably, 15 percent of the valid test cases were previously not considered by developers in their testing. We also tasked ChatGPT with identifying redundant test cases, which were subsequently validated by the respective developers to identify false positives and to uncover any redundant test cases that may have been missed by the developers themselves. This study highlights the potential of leveraging LLMs for test generation from the Requirements Specification document and also for assisting developers in quickly identifying and addressing redundancies, ultimately improving test suite quality and efficiency of the testing procedure.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.03693

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > District of Columbia > Washington (0.05)
Asia > India > NCT > Delhi (0.05)
(7 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback